8 research outputs found

    Aspectos matemáticos da entropia

    Get PDF
    Com o presente trabalho pretende-se fazer uma abordagem ao desenvolvimento do conceito Entropia. Inicia-se esta dissertação com a apresentação de uma perspectiva histórica da evolução do conceito. Enunciam-se as propriedades mais importantes das entropias de Shannon, Rényi e de Tsallis, assim como do ganho de informação de Kullback-Leibler. Apresentam-se as demonstrações de algumas destas propriedades. São expostas e analisadas as definiçõpes axiomáticas da entropia de Shannon e de Tsallis. Menciona-se ainda a definição axiomática do ganho de informação de Rényi, que conduz à definição da entropia de Rényi. ABSTRACT: This work intends to be an approach of the development of the concept of Entropy. We start with an historical perspective of the evolution of this concept. Several properties of the Shannon, Rényi and Tsallis entropies are presented, likewise Kullback-Leibler’s gain of information. Some of this properties are followed by it’s proof. We present and discuss the axiomatic definition of Shannon and Tsallis entropies. We also refer axiomatic definition of Rényi’s gain of information, which defines Rényi’s entropy

    Análise de distribuições de distâncias entre palavras genómicas

    Get PDF
    The investigation of DNA has been one of the most developed areas of research in this and in the last century. However, there is a long way to go to fully understand the DNA code. With the increasing of DNA sequenced data, mathematical methods play an important role in addressing the need for e cient quantitative techniques for the detection of regions of interest and overall characteristics in these sequences. A feature of interest in the study of genomic words is their spatial distribution along a DNA sequence, which can be characterized by the distances between words. Counting such distances provides discrete distributions that may be analyzed from a statistical point of view. In this work we explore the distances between genomic words as a mathematical descriptor of DNA sequences. The main goal is to design, develop and apply statistical methods specially designed for their distributions, in order to capture information about the primary and secondary structure of DNA. The characterization of empirical inter-word distance distributions involves the problem of the exponential increasing of the number of distributions as the word length increases, leading to the need of data reduction. Moreover, if the data can be validly clustered, the class labels may provide a meaningful description of similarities and di erences between sets of distributions. Therefore, we explore the inter-word distance distributions potential to obtain a word clustering, able to highlight similar patterns of word distributions as well as summarized characteristics of each set of distributions. With the aim of performing comparative studies between genomic sequences and de ning species signatures, we deduce exact distributions of inter-word distances under random scenarios. Based on these theoretical distributions, we de ne genomic signatures of species able to discriminate between species and to capture their evolutionary relation. We presume that the study of distributions similarities and the clustering procedure allow identifying words whose distance distribution strongly di ers from a reference distribution or from the global behaviour of the majority of the words. One of the key topics of our research focuses on the establishment of procedures that capture distance distributions with atypical behaviours, herein referred to as atypical distributions. In the genomic context, words with an atypical distance distribution may be related with some biological function (motifs). We expect that our results may be used to provide some sort of classi cation of sequences, identifying evolutionary patterns and allowing for the prediction of functional properties, thereby contributing to the advancement of knowledge about DNA sequences.A investigação do ADN é uma das áreas mais desenvolvidas neste e no último século. O crescente aumento do número de genomas sequenciados tem exigido técnicas quantitativas mais e cientes para a identi cação de características gerais e especí cas das sequências genómicas, os métodos matemáticos desempenham um papel importante na resposta a essa necessidade. Uma característica com particular interesse no estudo de palavras genómicas é a sua distribuição espacial ao longo de sequências de ADN, podendo esta ser caracterizada pelas distâncias entre palavras. A contagem dessas distâncias fornece distribuições discretas passíveis de análise estatística. Neste trabalho, exploramos as distâncias entre palavras como um descritor matemático das sequências de ADN, tendo como objetivo delinear e desenvolver procedimentos estatísticos especialmente concebidos para o estudo das suas distribuições. A caracterização das distribuições de distâncias empíricas entre palavras genómicas envolve o problema do crescimento exponencial do número de distribuições com o aumento do comprimento da palavra, gerando a necessidade de redução dos dados. Além disso, se os dados puderem ser validamente agrupados em classes então os representantes de classe fornecem informação relevante sobre semelhanças e diferenças entre cada grupo de distribuições. Assim, exploramos o potencial das distribuições de distâncias na obtenção de um agrupamento de palavras, que agrupe padrões de distâncias semelhantes e que coloque em evidência as características de cada grupo. Com vista ao estudo comparativo de sequências genómicas e à de nição de assinaturas de espécies, focamo-nos no desenvolvimento de modelos teóricos que descrevam distribuições de distâncias entre palavras em cenários aleatórios. Esses modelos são utilizados na de nição de assinaturas genómicas, capazes de discriminar entre espécies e de recuperar relações evolutivas entre estas. Presumimos que o estudo de semelhanças e a análise de agrupamento das distribuições permite identi car palavras cuja distribuição se afasta fortemente de uma distribuição de referência ou do comportamento global das maioria das palavras. Um dos principais tópicos de investigação foca-se na deteção de distribuições com comportamentos anormais, aqui referidas como distribuições atípicas. No contexto genómico, palavras com distribuições de distâncias atípicas poderão estar relacionadas com alguma função biológica (motivos). Esperamos que os resultados obtidos possam ser utilizados para fornecer algum tipo de classi cação de sequências, identi cando padrões evolutivos e permitindo a previsão das propriedades funcionais, representando assim um passo adicional na criação de conhecimento sobre sequências de ADN.Programa Doutoral em Matemátic

    Characterization of a large cluster of HIV-1 A1 infections detected in Portugal and connected to several Western European countries

    Get PDF
    HIV-1 subtypes associate with differences in transmission and disease progression. Thus, the existence of geographic hotspots of subtype diversity deepens the complexity of HIV-1/AIDS control. The already high subtype diversity in Portugal seems to be increasing due to infections with sub-subtype A1 virus. We performed phylogenetic analysis of 65 A1 sequences newly obtained from 14 Portuguese hospitals and 425 closely related database sequences. 80% of the A1 Portuguese isolates gathered in a main phylogenetic clade (MA1). Six transmission clusters were identified in MA1, encompassing isolates from Portugal, Spain, France, and United Kingdom. The most common transmission route identified was men who have sex with men. The origin of the MA1 was linked to Greece, with the first introduction to Portugal dating back to 1996 (95% HPD: 1993.6-1999.2). Individuals infected with MA1 virus revealed lower viral loads and higher CD4+ T-cell counts in comparison with those infected by subtype B. The expanding A1 clusters in Portugal are connected to other European countries and share a recent common ancestor with the Greek A1 outbreak. The recent expansion of this HIV-1 subtype might be related to a slower disease progression leading to a population level delay in its diagnostic.Supported by FEDER, COMPETE, and FCT by the projects NORTE-01-0145-FEDER-000013, POCI-01-0145-FEDER-007038 and IF/00474/2014; FCT PhD scholarship PDE/BDE/113599/2015; FCT contract FCT IF/00474/2014; European Funds through grant BEST HOPE (project funded through HIVERA, grant 249697) and by FCT PTDC/DTP-EPI/7066/2014. Global Health and Tropical Medicine Center are funded through FCT (UID/Multi/04413/2013). We would like to acknowledge all the patients and health care professionals from the Portuguese hospitals that contributed in some way to this study

    Characterisation of microbial attack on archaeological bone

    Get PDF
    As part of an EU funded project to investigate the factors influencing bone preservation in the archaeological record, more than 250 bones from 41 archaeological sites in five countries spanning four climatic regions were studied for diagenetic alteration. Sites were selected to cover a range of environmental conditions and archaeological contexts. Microscopic and physical (mercury intrusion porosimetry) analyses of these bones revealed that the majority (68%) had suffered microbial attack. Furthermore, significant differences were found between animal and human bone in both the state of preservation and the type of microbial attack present. These differences in preservation might result from differences in early taphonomy of the bones. © 2003 Elsevier Science Ltd. All rights reserved

    NEOTROPICAL ALIEN MAMMALS: a data set of occurrence and abundance of alien mammals in the Neotropics

    No full text
    Biological invasion is one of the main threats to native biodiversity. For a species to become invasive, it must be voluntarily or involuntarily introduced by humans into a nonnative habitat. Mammals were among first taxa to be introduced worldwide for game, meat, and labor, yet the number of species introduced in the Neotropics remains unknown. In this data set, we make available occurrence and abundance data on mammal species that (1) transposed a geographical barrier and (2) were voluntarily or involuntarily introduced by humans into the Neotropics. Our data set is composed of 73,738 historical and current georeferenced records on alien mammal species of which around 96% correspond to occurrence data on 77 species belonging to eight orders and 26 families. Data cover 26 continental countries in the Neotropics, ranging from Mexico and its frontier regions (southern Florida and coastal-central Florida in the southeast United States) to Argentina, Paraguay, Chile, and Uruguay, and the 13 countries of Caribbean islands. Our data set also includes neotropical species (e.g., Callithrix sp., Myocastor coypus, Nasua nasua) considered alien in particular areas of Neotropics. The most numerous species in terms of records are from Bos sp. (n = 37,782), Sus scrofa (n = 6,730), and Canis familiaris (n = 10,084); 17 species were represented by only one record (e.g., Syncerus caffer, Cervus timorensis, Cervus unicolor, Canis latrans). Primates have the highest number of species in the data set (n = 20 species), partly because of uncertainties regarding taxonomic identification of the genera Callithrix, which includes the species Callithrix aurita, Callithrix flaviceps, Callithrix geoffroyi, Callithrix jacchus, Callithrix kuhlii, Callithrix penicillata, and their hybrids. This unique data set will be a valuable source of information on invasion risk assessments, biodiversity redistribution and conservation-related research. There are no copyright restrictions. Please cite this data paper when using the data in publications. We also request that researchers and teachers inform us on how they are using the data

    Neotropical freshwater fisheries : A dataset of occurrence and abundance of freshwater fishes in the Neotropics

    No full text
    The Neotropical region hosts 4225 freshwater fish species, ranking first among the world's most diverse regions for freshwater fishes. Our NEOTROPICAL FRESHWATER FISHES data set is the first to produce a large-scale Neotropical freshwater fish inventory, covering the entire Neotropical region from Mexico and the Caribbean in the north to the southern limits in Argentina, Paraguay, Chile, and Uruguay. We compiled 185,787 distribution records, with unique georeferenced coordinates, for the 4225 species, represented by occurrence and abundance data. The number of species for the most numerous orders are as follows: Characiformes (1289), Siluriformes (1384), Cichliformes (354), Cyprinodontiformes (245), and Gymnotiformes (135). The most recorded species was the characid Astyanax fasciatus (4696 records). We registered 116,802 distribution records for native species, compared to 1802 distribution records for nonnative species. The main aim of the NEOTROPICAL FRESHWATER FISHES data set was to make these occurrence and abundance data accessible for international researchers to develop ecological and macroecological studies, from local to regional scales, with focal fish species, families, or orders. We anticipate that the NEOTROPICAL FRESHWATER FISHES data set will be valuable for studies on a wide range of ecological processes, such as trophic cascades, fishery pressure, the effects of habitat loss and fragmentation, and the impacts of species invasion and climate change. There are no copyright restrictions on the data, and please cite this data paper when using the data in publications

    NEOTROPICAL XENARTHRANS: a data set of occurrence of xenarthran species in the Neotropics

    No full text
    Xenarthrans—anteaters, sloths, and armadillos—have essential functions for ecosystem maintenance, such as insect control and nutrient cycling, playing key roles as ecosystem engineers. Because of habitat loss and fragmentation, hunting pressure, and conflicts with domestic dogs, these species have been threatened locally, regionally, or even across their full distribution ranges. The Neotropics harbor 21 species of armadillos, 10 anteaters, and 6 sloths. Our data set includes the families Chlamyphoridae (13), Dasypodidae (7), Myrmecophagidae (3), Bradypodidae (4), and Megalonychidae (2). We have no occurrence data on Dasypus pilosus (Dasypodidae). Regarding Cyclopedidae, until recently, only one species was recognized, but new genetic studies have revealed that the group is represented by seven species. In this data paper, we compiled a total of 42,528 records of 31 species, represented by occurrence and quantitative data, totaling 24,847 unique georeferenced records. The geographic range is from the southern United States, Mexico, and Caribbean countries at the northern portion of the Neotropics, to the austral distribution in Argentina, Paraguay, Chile, and Uruguay. Regarding anteaters, Myrmecophaga tridactyla has the most records (n = 5,941), and Cyclopes sp. have the fewest (n = 240). The armadillo species with the most data is Dasypus novemcinctus (n = 11,588), and the fewest data are recorded for Calyptophractus retusus (n = 33). With regard to sloth species, Bradypus variegatus has the most records (n = 962), and Bradypus pygmaeus has the fewest (n = 12). Our main objective with Neotropical Xenarthrans is to make occurrence and quantitative data available to facilitate more ecological research, particularly if we integrate the xenarthran data with other data sets of Neotropical Series that will become available very soon (i.e., Neotropical Carnivores, Neotropical Invasive Mammals, and Neotropical Hunters and Dogs). Therefore, studies on trophic cascades, hunting pressure, habitat loss, fragmentation effects, species invasion, and climate change effects will be possible with the Neotropical Xenarthrans data set. Please cite this data paper when using its data in publications. We also request that researchers and teachers inform us of how they are using these data
    corecore